# Multilingual Image Captioning
Paligemma2 3b Mix 224
PaliGemma 2 is an upgraded vision-language model developed by Google, combining the capabilities of Gemma 2, supporting image and text inputs to generate text outputs, suitable for various vision-language tasks.
Image-to-Text
Transformers

P
google
15.23k
28
Imgcap Soli
Apache-2.0
An image-to-text model based on the transformers library, capable of converting image content into descriptive text.
Image-to-Text
Transformers Supports Multiple Languages

I
jingjietan
17
1
Paligemma 3b Ft Science Qa 448
PaliGemma is a 3B-parameter lightweight vision-language model developed by Google, built upon SigLIP vision model and Gemma language model, supporting image and text inputs to generate text outputs.
Image-to-Text
Transformers

P
google
15
2
Paligemma 3b Mix 448
PaliGemma is a versatile lightweight vision-language model (VLM) built upon the SigLIP vision model and Gemma language model, supporting image and text inputs to generate text outputs
Image-to-Text
Transformers

P
google
5,488
109
Paligemma 3b Ft Docvqa 896
PaliGemma is a lightweight vision-language model developed by Google, built on the SigLIP vision model and the Gemma language model, supporting multilingual image-text understanding and generation.
Image-to-Text
Transformers

P
google
519
9
Paligemma 3b Ft Vqav2 448
PaliGemma is a lightweight vision-language model developed by Google, combining image understanding and text generation capabilities, supporting multilingual tasks.
Text-to-Image
Transformers

P
google
121
17
Featured Recommended AI Models